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Abstract — This paper, mostly tutorial in nature, deals with the 
problem of characterizing the capacity of fading channels in the 
high signal-to-noise ratio (SNR) regime. We focus on the practi- 
cally relevant noncoherent setting, where neither transmitter nor 
receiver know the channel realizations, but both are aware of the 
channel law. We present, in an intuitive and accessible form, two 
tools, first proposed by Lapidoth & Moser (2003), of fundamental 
importance to high-SNR capacity analysis: the duality approach 
and the escape-to-infinity property of capacity-achieving distribu- 
tions. Furthermore, we apply these tools to refine some of the re- 
sults that appeared previously in the literature and to simplify the 
corresponding proofs. 

I. Introduction 

Most wireless communication systems operate in the nonco- 
herent setting where neither transmitter nor receiver have a priori 
information on the realization of the underlying fading channel. 
As channel state information is typically acquired by allocating 
transmission time and/or bandwidth to channel estimation (a 
typical example is the use of pilot symbols 1 1 1), a problem of sig- 
nificant practical relevance is to determine the optimal amount of 
resources to be used for this task. This problem can be addressed 
in a fundamental fashion by determining the Shannon capacity 
(i.e., the ultimate limit on the rate of reliable communication pj) 
in the noncoherent setting. Unfortunately, corresponding analyt- 
ical results are exceedingly difficult to obtain, even for simple 
channel models [3 1; nevertheless, significant progress has been 
made during the past few years by studying the capacity behavior 
in the asymptotic regimes of high and low signal-to-noise ratio 
(SNR). Throughout this paper, we shall deal exclusively with 
the high-SNR regime. The capacity behavior at high SNR turns 
out to be very sensitive to the channel model used |,4J-||6J . In 
this paper, we shall focus on a channel model — the correlated 
block-fading model fj\, fF| — that is simple and yet rich enough 
to illustrate some of the possible asymptotic dependencies of 
capacity on SNR, namely, logarithmic with different pre-log 
factors g, (5), ||9|, (TOl, or double-logarithmic Q. The aim of 
this tutorial paper is two-fold: 

• We present, in an intuitive and accessible manner, two 
tools that turn out to be exceedingly useful in the char- 
acterization of capacity at high SNR: the duality approach 



and the escape-to-infinity property of capacity-achieving 
distributions. These tools were first introduced in [^|. 
• We use these tools to refine a result that appeared previously 
in frj and to provide an alternative and much simpler proof 
of a result in [9J, | , lOj . Furthermore, we develop insights 
into the use of duality by exploiting the geometry of the 
correlated block-fading model. 

Notation 

Uppercase boldface letters denote matrices and lowercase 
boldface letters designate vectors. Uppercase sans-serif letters 
(e.g., Q) denote probability distributionsj^while lowercase sans- 
serif letters (e.g., r) are reserved for probability density functions. 
The superscripts and ^ stand for transposition and Hermitian 
transposition, respectively. We denote the identity matrix of 
dimension x by Iat; diag{a} is the diagonal square matrix 
whose main diagonal contains the entries of the vector a, and 
Ag(A) stands for the gth largest eigenvalue of the Hermitian 
positive-semidefinite matrix A. For a random vector x with 
distribution Q, we write x '-^ Q. We denote expectation by £[•], 
and use the notation Ex['] or Eq[-] to stress that expectation 
is taken with respect to x ^ Q. We write _D(Q(-)|| R(-)) for 
the relative entropy between the distributions Q and R ||2] 
Sec. 8.5]. Furthermore, CA/^(0, R) stands for the distribution 
of a ckcularly-symmetric 1 1 1 Def. 24.3.2] complex Gaussian 
random vector with covariance matrix R. For two functions f{x) 
and g{x), the notation f{x) = 0{g{x)), x — >■ cx), means that 
limsup^_j.oc |/(a;)/(7(a;)| < oo, and f{x) = o{g{x)), x ^ oo, 
means that \iinx^oo\f (x) / g{x)\ — 0. Finally, log(-) indicates 
the natural logarithm. 

A. The Channel Model 

In our quest for simplicity of exposition, we chose to focus on 
the correlated block-fading channel model 1 7 ] , 1 8 1 . In this model, 
the channel changes in an independent fashion across blocks of 
N discrete-time samples and exhibits correlated fading within 
each block (with the same fading statistics for all blocks). The 

'We will refer to probability distributions simply as distributions in the 
remainder of the paper. 



input-output (lO) relation corresponding to one such block is 
given by: 

y = diag{h}x + w. (1) 



Here, x = [xi . . . xn] G C is the (random) input vector, 
which we assume to satisfy the average-power constraint 



1 

TV 



E[||x|| 



< 



(2) 



The vector w ^ CJ\f{0, In) represents additive white Gaussian 
noise (AWGN), and h ^ CAf{0, R) contains the fading channel 
coefficients. The vectors x, h, and w are mutually independent. 
We assume that R has rank Q {I < Q < N) and that the main- 
diagonal entries of R are all equal to 1. Throughout the paper, we 
consider the noncoherent setting where transmitter and receiver 
know the statistics of h, but not its realizations. 

The model we just described may seem contrived at first sight. 
Yet, it is of practical relevance for at least two reasons. First, it 
captures the essence of channel variations (in time) in an accurate 
but simple way: the rank Q of R corresponds to the minimum 
number of entries of h that need to be known to perfectly recover 
the whole vector (in the absence of noise); therefore, larger 
Q corresponds to faster channel variation. Second, when R is 
circulant, the 10 relation ([T]i coincides with the 10 relation — in 
the frequency domain — of a cyclic-prefix orthogonal frequency- 
division multiplexing system |12| operating over a frequency- 
selective channel with Q uncorrected taps. In other words, the 
model in ([T]l can be thought of as the dual of the widely used 
intersymbol-interference channel model. Independence across 
blocks is a sensible assumption for systems employing time- 
division multiple access or frequency hopping |13) . Finally, we 
remark that for the special case Q ~ 1, the channel model in ([T]) 
reduces to the piecewise-constant block-fading channel model 
previously used in numerous papers such as p3) , | |T0[ , 

B. Channel Capacity 

The capacity of the channel in ([T]i is given by 
1 



C{p) ^ T7sup/(x;y). 
Q 



(3) 



Here, J(x;y) denotes the mutual information |j2] Sec. 8.5] 
between x and y in ([TJ, and the supremum is taken over all 
distributions Q on x that satisfy the average-power constraint (|2]i. 
Because the variance of the entries of h and w is normalized to 
one, we can interpret p as the receive SNR. 

The literature is essentially void of analytic expressions for 
C{p), even for the simplest case iV = 1. Nevertheless, as we 
shall see in the next section, the high-SNR behavior of C{p) can 
be characterized fairly well. 

C. Known Results and Our Contributions 

For the general case 1 < Q < N, Liang and Veeravalli 
showed that |[7] Props. 3 and 4] 



N-Q 
N 



This result is sufficient to characterize the capacity pre-log 
X, defined as the asymptotic ratio between capacity and the 
logarithm of SNR as SNR goes to infinity: 

C{p) 



X 



lim 



p->-oo log p 

The pre-log can be interpreted as the fraction of signal-space 
dimensions that can be used for communication. From Q we 
find the pre-log to be given by the difference of two terms, i.e., 
X = 1 — Q /N. The first term can be thought of as the capacity pre- 
log when the channel is known perfectly at the receiver (in this 
case, X = 1 |[T4|); the second term quantifies the loss in signal- 
space dimensions due to the lack of channel knowledge. Note 
that Q/N is the smallest fraction of entries of the TV-dimensional 
vector h that need to be known to reconstruct the whole vector 
in the absence of noise Hence, we can further interpret the 
penalty term Q/N as the fraction of signal-space dimensions in 
which pilot symbols need to be transmitted to allow the receiver 
to learn the channel. 

When Q — N, i.e., the channel correlation matrix has full 
rank, Q implies that the pre-log is equal to 0. It turns out that in 
this case the ©(log log p) term in Q is tight and capacity grows 
double-logarithmically in SNR. This surprising result was proven 
in J?] Lem. 5]. In Section [III-B[ we shall refine the result in Q 
Lem. 5] by providing the following, more accurate, high-SNR 
capacity characterization: 

C{p) = loglogp- 7 - 1 
1 ^ 

--^logA,(R) + o(l), p^cx). (5) 

9=1 

This result characterizes capacity (for Q = N) up to a o(l) term 
(i.e., a term that vanishes as p ^ oo). In contrast, the expression 
provided in |j7] Lem. 5] agrees with capacity only up to a 0(1) 
term (i.e., a term that is bounded as p oo). 

The most important tool in the proof of (|5]l is the duality 
approach, a technique first introduced in Q to characterize 
the capacity of stationary ergodic fading channels with finite 
differential entropy rate. The essence of the duality approach 
is that it allows one to obtain tight upper bounds on C (p) by 
choosing appropriate distributions on the output y. Compared 
to the treatment in [41, our goal in Sections [II- A| and [III-B| is to 
provide the simplest and most accessible proofs for the main 
results underlying the duality approach. This comes at the cost 
of generality (in terms of noise and fading statistics). 

While finding a capacity characterization that — like (|5]l — is 
tight up to a o(l) term for all Q with 1 < Q < is an interesting 
open problem, for the special case Q — 1 (with Q < N) the 
following result was reported in fTO), ||9): 



C(p) = ^^[logp + log7V-7-l] 



iV 



logr(A^) 

N 



+ o(l), p^oo. (6) 



Iogp + O(loglogp), (0->oo. (4) 



-As we shall see, neglecting additive noise in l[TJ yields useful insights on the 
capacity pre-log. 



Here, 7 denotes the Euler-Mascheroni constant, and r( ) is the 
Gamma function Eq. (197)]. The proof of (|6]l provided in 1 10] 
is based on a rather technical argument and does not seem to 
explicitly exploit the geometry in the problem, i.e., the fact that x 
and y are collinear in the absence of noise. The proof in |9 1 does 
exploit this geometry through an apposite change of variables, 
and applies to the multiple-antenna setting as well. 

In Section III-A we present a simple, alternative proof of (|6]) 



that, differently from the proofs in [lOJ , ||9J, is based on duality 
and exploits the geometry in the problem to motivate the choice 
of the output distribution. Our proof needs another tool put 
forward in yj: the escape to infinity property of the capacity 
achieving distribution. This property, which we review in Sec- 
tion |II-B[ allows one to restrict the maximization in ([3| to a 
smaller set of distributions. 

II. The Toolbox 

A. The Duality Approach 

To prove (|5]l and (|6]), we sandwich capacity between a lower 
and an upper bound that agree up to a o(l) term. Establishing 
capacity lower bounds is, in principle, relatively simple: it 
suffices to evaluate the mutual information in ([3| for an input dis- 
tribution Q that satisfies the average-power constraint. Obviously, 
care must be exercised in choosing Q, so as to ensure that the 
resulting bound is tight in the limit p — > 00 (see Section [HI- A2| 
for a concrete example). 

Capacity upper bounds are more difficult to find because of the 
need for maximization over the set of eligible input distributions. 
To single out the main difficulty with this optimization problem, 
it is convenient to denote the conditional distribution of y given x 
as W(- 1 x) and to use the symbol QW to indicate the distribution 
induced on y by the input distribution Q and by the "channel" 
W(- I x). By the definition of mutual information ||2j Sec. 8.5] 
we have that 



/(x;y)=EQ[I?(W(.|x)||(QW)(.))] 



(7) 



As the right-hand side (RHS) of (|7]i is a rather complicated 
function of Q, the maximization in (|3]l is difficult to carry out. The 
idea behind duality is to upper-bound the RHS of (|7]i by replacing 
QW by a distribution that does not depend on Q. Concretely, let 
R be an arbitrary distribution on y. Then 



/(x;y) Eq[D{\N{- \ x)||R(.))] - i?((QW)(.)||R(-)) 

(b) 



< EQ[i?(W(.|x)||R(.))] 



(8) 



Here, (a) follows from Tops0e's identity p5) and (b) is a conse- 
quence of the nonnegativity of relative entropy fT, Thm. 2.6.3]. 
The RHS of (|8]l is easier to deal with than /(x; y). In fact, as 



we shall illustrate in Sections III-A and III-B it is possible — 
for an appropriate choice of R — to find an asymptotically tight 
upper bound on Eq[L»(W(- | x)||R(-))] that holds for every Q 
satisfying the average-power constraint (|2|. By ([3]), this upper 
bound constitutes an upper bound on C{p). 



As a side remark, we note that the inequality ( 8) holds with 
equality when R coincides with QW. Hence, (SJ yields the 
following expression for mutual information: 



/(x;y)=infEQ[i?(W(.|x)||R(.))] 



(9) 



r(y) 



|2(a-JV)^-||y|lV/3 



y e 



(10) 



Through further manipulations (see | |T6| , Q for details), the 
identity Q yields a dual expression for capacity, with the 
maximization over the input distribution in (|3j replaced by 
a minimization over the output distribution. This is why the 
technique is referred to as duality approach. 

An appropriate choice of the output distribution R is crucial 
for the bound in ([8]) to be tight. Throughout the paper, the output 
distribution R with density 

r(vv) I 

7r^;3"r(a)' 

will play a prominent role. Here, /3 — N{p + l)/a, where a is 
a free parameter whose meaning will become clear later. This 
output distribution was put forward in Q in a more general 
setting. The main features of this distribution are that y is 
isotropically distributed and that ||y||^ is Gamma distributed 
with parameter a. In Section |III-A[ we will show that, for the 
piecewise-constant block-fading channel model (i.e., Q = 1), 
this choice for the output distribution can be motivated through 
simple geometric intuition. 

B. Escape-To-Infinity Property 

Duality simplifies the maximization over the input distribution 
in (|3]l, at the cost of getting an upper bound on capacity. This 
simplification, together with an appropriate choice of Q to obtain 
a matching capacity lower bound, is enough to establish (|5]l, as 
we shall see in Section III-B To prove (|6]l, however, we need 
an additional tool. Specifically, we will make use of the fact 
that the asymptotic behavior of C{p) does not change if we 
constrain the input distributions Q in ([3| to be supported strictly 
outside a sphere of arbitrarily large radius. We formalize this 
result, which turns out to hold for almost all wkeless channel 
models of practical interest Q, | |17| , in the following theorem. 
In view of (|6]l, we focus on the case Q = 1. 

Theorem 1: Fix an arbitrary po > and let /C = {x G C''^ : 
||x|p < po}. Denote by C(/5) the capacity of the channel with 10 
relation ([T]i (with Q = 1) under the average-power constraint (|2]). 
Furthermore, denote by Cic{p) the capacity of the same channel 
under the additional constraint — ^besides (|2| — that x ^ /C with 
probability one (w.p.l). Then 



lim C{p) 



1 



1/N) \ogp 

'Ck{p) 



lim 

p— >-oo 



(1-1/7V) logp 



Proof: The high-SNR capacity expansion Q implies that 
the capacity pre-log i^l — The logarithmic growth of 
capacity in SNR allows us to invoke [l?, Thm. 8] and conclude 
that the capacity-achieving input distribution must escape to 

^It is worth mentioning that the proof of j4j does not make use of Theorem^ 
so there is no cyclic argument here. 



m^wify I?] Def. 4.11], i.e., that forall po > there exists a family 
of input distributions {Qp}p>o (parametrized with respect to p) 
satisfying {1/N) Eq^ [||x|p] < p, such that, when x ~ Qp, 



and 



lim{C(p)-/(x;y)} = 

p— ^OO 



lim P{|lx|l2 < po} = 0. 

p— >CX) 



The proof is concluded by noting that the escape-to-infinity 
property is a sufficient condition for Theorem [T] to hold, as a 
consequence of |j4j Thm. 4.12]. ■ 

III. HiGH-SNR Capacity Asymptotics 
A. The Rank-One Case 

When Q ~ 1, we can rewrite ([T]) in the following (more 
convenient) form 

y = sx + w (11) 

where s ^ CJ\f{0, 1). The high-SNR capacity expansion (|4]) 
implies that the capacity pre-log of the channel in ( [TT| i is given 
by 1 — 1/iV. This is in agreement with the intuition we provided 
in Section |1-C[ one pilot symbol per block is enough to learn 
the channel in the absence of noise. We next provide a different 
interpretation of this result, which is of geometric nature and 
sheds light on how to select input and output distributions to get 
capacity bounds that are tight as p — > oo. 

1) Geometric Intuition: Let x be an arbitrary vector in C^. 
This vector can be specified by identifying i) the linear subspace 
spanned by x, i.e., the complex line passing through the origin 
and X and ii) the point on that line corresponding to x (i.e., a 
complex number). If we neglect additive noise, the lO relation 
in ( [TT| l reduces to y = sx. As s varies, y spans the line x 
lies on. In other words — as pointed out in l|9| — the random 
channel coefficient s destroys the information about x specified 
in the second step of our description above, but leaves the 
information about the linear subspace spanned by x unchanged. 
To summarize, when the random channel coefficient s is not 
known to the receiver, the information that the receiver can 
recover about the transmitted signal x is the line on which x 
lies. But a complex line in is fully characterized by — 1 
complex parametersj^Hence, the received signal "carries" A^ — 1 
parameters describing x. This number, divided by A^, coincides 
with the capacity pre-log. 

2) A Capacity Lower Bound: The geometry unveiled in the 
previous section suggests to use the direction of x, but not 
its magnitude, to convey information. This insight is helpful 
in choosing an input distribution that yields a tight capacity 
lower bound. Concretely, we take x = \/Np ■ Ux where Ux 
is uniformly distributed on the unit sphere in C^. We use this 

'*More formally, the set of lines passing through the origin of C'^ forms a 
manifold (the complex projective space C'P^~^) of N — 1 complex dimen- 
sions |18|. 



input distribution, which trivially satisfies the average-power 
constraint (|2]), to lower-bound capacity as follows: 



Af-C(p)>/(x;y)=h(y)-h(y|x) 
> h(y I w) - h(y I x) 
= li( ,sxj-h(y|x). 



(12) 



Here, h( ) denotes differential entropy f?}. Sec. 8.1], the second 
inequality follows because conditioning reduces differential 
entropy |2 Sec. 8.6], and the last equality follows because 
differential entropy is invariant to translations |2 Thm. 8.6.3] and 
w is independent of s and x. To compute h(r), it is convenient 
to switch to polar coordinates, i.e., r i~> (||r||,Ur), where 
Ur = r/||r||. The change of variable theorem then yields ||4] 
Lem. 6.17]: 

h(r) = h(||r||)+h,phere(ur | |1 r ||) + (2 A^ - 1) E [log| | r 1 1 ] . (13) 

Here, hsphere( ) denotes the differential entropy computed with 
respect to the area measure on the unit sphere in C''^ pi p. 2457]. 
By the choice of the input distribution, we have ||r|| = ^/Np \ s\. 
Furthermore, because Ux is uniformly distributed on the unit 
sphere and s is circularly symmetric (i.e., the phase of s is 
uniformly distributed on [— tt, tt) and is independent of \s\ pT 
Prop. 24.2.6]), it follows that r is isotropically distributed |]4 
Def. 6.19]. Hence, is uniformly distributed on the unit sphere 
and is independent of ||r||. Based on these observations, we can 
now simplify ( [T3| ) as follows: 



h(r) Wh(yA^|s|) +li,pta-e(ur) + (2A^- l)E[log||r|| 



log V^P + h(|s|)+log^^^^ 
+ {2N~1) [log + E[log 

^^N\og{Np)+h{\sf) +\0g: 

|2" 



{N -l)E log|s 



^^N \ogiNp) + 1 + log ^ - (AT - 1)7. (14) 

Here, in (a) we used the independence of Ur and ||r|| to drop 
conditioning in the second term on the RHS of ( [T3] l. In (b) we 
used that h{ax) — log a + h{x) for x a real- valued random 
variable and a real and nonnegative; we also used that Ur is 
uniformly distributed on the unit sphere in C^, and that, as a 
consequence, hspheie(ur) is equal to the area of that sphere, i.e., 
27r^/r(Ar). In (c) we used that 

h{v) ^ h(u2) -E[logw] - log 2 

for every real nonnegative random variable v Q Lem. 6.15], 



and (d) follows because E 
s - CA/'(0, 1). 



log I 



-7 and hdsj") = 1 for 



Since h is a circularly-symmetric complex Gaussian vector, 
h(y I x) on the RHS of ( [T2] i admits the following closed-form 
expression; 

h(y|x)=log(7re)^+E[log(l + ||xf)] 
= log(7re)^+log(l + iVp) 
= log(7re)^+log(Arp) + o(l), p ^ oo. (15) 

Substituting ( [T4] i and ([TSj into ( [T2] i, we get a lower bound on 
C{p) that coincides with the RHS of (|6|. 

5j A Matching Upper Bound: To obtain an upper bound that 
matches the lower bound we just found up to a o(l) term, we 
use duality and the escape-to-infinity property of the capacity- 
achieving distribution. More specifically, as a consequence 
of Theorem [T| we can, without loss of generality, constrain 
the maximization of mutual information in ([3]) to input distribu- 
tions Q that satisfy — ^besides the average-power constraint Q — 
the additional constraint ||x||^ > po w.p.l. Here, po > is a 
parameter to be optimized later. We use duality with the density 
of R given by ( fTO] ) with a = 1. This choice is again motivated by 
the geometric considerations in Section [in^AT] in the noiseless 
case, the density of the output distribution induced by the input 



Using Jensen's inequality with respect to the random vari- 
ables zi, . . . , zn-1, we obtain the following bound: 

E,[E,,w[log||yinx]] 



distribution used in Section III-A2 (to derive a capacity lower 



bound), equals ([TO]l with a = 1. Fix po > and take an arbitrary 
input distribution Q such that x ^ Q satisfies (|2]l and ||x|p > po 
w.p.l. By duality, we have that 

/(x;y)<EQ[i?(W(.|x)||R(.))] 
1 



= Eqw 



log 



-h(y|x) 



r(y) 

log[. 

(iV-l)EQw[log||y 



: log TT^^ + log[7V(p + 1)] - log r( A^) 

, EQw[||y|P] 



N{p+\) 



-h(y|x). 

(16) 



Here, the first equality follows from straightforward algebraic 
manipulations; in the second equality we used ([TO]l with a = 1. 
We shall next evaluate or bound the terms on the RHS of ( [T6] ) 
that depend on Q. First, note that 

EQw[||y||'] =Eqw[||sx + w||2] <7V(p+1). (17) 

Here, we used independence of x and w and the power con- 
straint (|2]l. To evaluate Eqw [log||y|p] , we proceed as follows; 
first note that 

EQw[log||y|p] =E,[E,,w[log||y|p|x]] . 



We next use that, given x, the random variable 1 1 y 1 1 ^ is distributed 
as E^T' + + l-^wl', where the z„ i = 1, 2, . . . , TV, 
are i.i.d. CN{Q, 1). This result follows by observing that, given 
X, the output vector y has covariance matrix xx^ + Iat (whose 
eigenvalues are 1 + ||x|p and 1 with multiplicity — 1). 



E. 



log I 



= E 

< E. 

= E4log(l + ||x||2)] 



'N-l 

Y^\z,\^ + {l + M?)\zn? 



l0g(iV-l + (l + ||x||2)|z^|2) 

N 



E, 



log 



<E4log(l + |lxf)] 



sup E^„ 



log 



N - 1 



1 



\zn\ 



\zn\ 



(18) 



(19) 



In the last step, we upper-bounded the second term on the RHS 
of ( fTSj ) by replacing the expectation over x by the supremum 
over all vectors x satisfying Ijxp > pg. This is the step where 
the escape-to-infinity property is used. Without this property, 
the supremum in ([T9]l would be over all x satisfying ||x|p > 
and the resulting bound would not match (up to a o(l) term) the 
lower bound obtained in Section iriI-A2l To evaluate the second 
term on the RHS of ([T9| we use the following lemma; 
Lemma 2: Let z ^ CA/'(0, 1) and take a > 0. Then 



E, 



\og{a+\zY) =e''r(0,a) + loga. 

^ > 

-9(a) 



Here r(-,) denotes the incomplete Gamma function ||4] 
Eq. (200)]. The function g{a) is monotonically increasing in 
a. Furthermore, lim(j_j.o g{a) = —7. 



Proof: Let v 



Then 



E, 



log(a+ \z\ 



e " log(a + v)dv 



e-Hogtdt (20) 



e"r(0,a) +loga. (21) 



Here, to obtain the second equality we used integration by parts. 
Now let us denote the RHS of ( |2T] l by g{a). It is easy to verify 
that g(a) is a monotonic function of a > 0. In fact. 



dg{a) 
da 



= e°r(0,a) 



which is nonnegative for all a > 0. Finally, the claim that 
lima-i-o gio) — —7 follows from ( |20] l by setting a = 0. ■ 
As a consequence of Lemma [2j we have that 



sup E^^ 



log 



N - 1 



1 



\zn\ 



= 9 



N-l 



1 



Po 



Finally, for the conditional differential entropy term in ^T6\ we 
have 

h(y|x)=log(^e)^+E[log(l + ||x||2)]. 



To summarize, we proved that 
/(x; y) < log{Np + N) + {N - 2)E [log(l + ||x||2)] 

+ {N- l)g(^^) - log r{N) - (TV - 1). 



1 



Po 



Now, using Jensen's inequality on E [log(l + ||x|p)] , we obtain 



lim [/(x;y)-(7V-l)logp] 

p— )-oo 



< (N-l) 



log TV - 1 



iV- 1 
1 + Po 



logr(iv). 



The proof is concluded by recalling that, by Theorem [T[ the 
asymptotic behavior of C{p) does not change if we constrain Q 
to satisfy ||x|p > pow.p.l, and by noting that, by Lemma|2j we 
can make the term g{{N — 1)/(1 + po)) to be arbitrarily close 
to —7 by taking po sufficiently large. 

B. The Full-Rank Case 

Due to space constraints, we shall give an outline only of the 
proof of ([5]l and, furthermore, restrict ourselves to i.i.d. channels, 
i.e., R = Iat. We comment on the general case at the end of the 
section. 

First, we note that R = Ijy implies that the channel is 
memoryless, and, hence, capacity is achieved by i.i.d. inputs. 
As a consequence. 



sup/(x;y) 
Q 



N sup I{x; y). 
Q 



(22) 



Here, y — sx + w with s,w ^ CJ\f{0, 1) and the supremum is 
over the distributions Q on a; that satisfy the average-power con- 
straint Ep \x\^ < p. The capacity of the memoryless channel 
y = sx + w was first proven to grow double-logarithmically 
in SNR in ||T9|. This result was then extended in |]4) Thm. 4.2] 
to multiple-antenna channels with general stationary ergodic 
fading distribution (of finite differential entropy rate) and general 
noise distributions. The proof we provide here is based on the 
duality technique and is particularly simple, as it exploits the 
Gaussianity of the fading distribution. More specifically, we 
use duality with the density of R given in ( [TO] i, with = 1 
and a — [1 + log(l + p)]^^- The choice of a might appear 
unmotivated, and in fact, differently from the previous section, it 
is hard to find an intuitive explanation for this choice, besides the 
fact that it simplifies the proof Consider an arbitrary Q satisfying 
p; using ([8]) and ( fTO] ), we obtain the following upper 



< 



bound on I{x; y): 

y) < log TT + a log(l + p) — a log a + log r(a) 



+ (l-a)EQyy log|y| 



E 



QW 



1 + P 



Hy\x) 



< log TT + q:[1 + log(l + p)] — a log a + log r(Q!) 



E, 



QW 



log \y\ - h(y I x) 



(23) 



The last step follows because E^^ 
by assumption, so that 



(1 -")Eqw [logly 
We continue by establishing that 



< E 



QW 



< 1 + p and a < 1, 



log \y\ 



Eqw 



log \y\ 



h(?/ 1 x) — —7 — log vr — 1. 



This identity follows because 

hiy \x)=E log(l + |a;|^) + log(7re) 

and because, given x, the random variable \y\^ is distributed as 

(1 + |a;|^) |z|^ where z - C7V(0, 1), so that 



E 



QW 



log|2/| 



= E, 
= E, 
-E, 



E,, 
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log|y| I a; 
log[(l 



xf ) \zf] \x 



iog(i + ixn 



E, 



log|z| 



Finally, since a[l + log(l + p)] = 1 and since Eq. (337)] 
logr(a) — aloga + logo; = 0(1), p — > 00, 

we get 

I{x; y) < log TT + a[l + log(l + p)] 
1 

-|- log r(a) — a log a + log a — log a 



+ E 



QW 



0(1), >-oo 

loglyl^ -h(y|x) 



— 7— log 7r— 1 

< — log a — 7 + 0(1), p — 00 
= loglogp — 7 + 0(1), p— >oo. 



(24) 



(25) 



This upper bound, wiiich suffices to conclude that capacity grows 
at most double-logarithmically in p, can actually be tightened. 
A more careful choice of the output distribution makes the term 
a[l + log(l + p)] vanish as p — 7> cx), so that —7 in ( [25] ) gets 
replaced by —7 — 1 (see 1^4, App. VII] for details). This modified 
upper bound is tight in the sense that one can find a capacity lower 
bound that matches it up to a o(l) term (see P, Thm. 4.16]). To 
summarize, for R = Iat, we have that 



C(p) =loglogp-7-l + o(l), p 



(26) 



We conclude by noting that the double-logarithmic growth of 
capacity in SNR holds for every full-rank channel covariance 
matrix R. Correlation among the channel entries, however, 
results in a different constant term in (|26|. More specifically, 
the final result in Q follows from ||4] Lem. 4.5] and from an 
adaptation of |4[ Thm. 4.41] to the block-fading setup considered 
here. 



IV. Open Problems 

Duality is the main tool we used to establish the novel capacity 
expansion (|5| for the full-rank case and to provide an alternative, 
simple proof of ^ for the rank-1 case (i.e., the piecewise- 
constant block-fading channel model). For the latter case, in 
particular, we showed how the geometry of the communication 
problem at hand can be used to find an output distribution that 
yields an asymptotically tight upper bound. Finding a o(l)- 
accurate capacity characterization when 1 < Q < iV is an 
interesting open problem. 

Throughout the paper, we focused exclusively on the single- 
antenna setup. In the multiple-antenna case, not even a pre-log 
characterization is available when 1 < Q < A^. A pre-log lower 
bound for the single-input multiple-output (SIMO) case has been 
obtained recently in [8|. Surprisingly, the bound in [8] implies 
that the SIMO pre-log can be larger than the pre-log in the 
single-input single-output case. Establishing whether this bound 
is tight is an open problem. For two further channel models of 
practical interest, namely, the single-antenna frequency-selective 
and doubly-selective fading channel the state of affairs is similar: 
pre-log lower bounds have been reported in |20| and |21| , 
respectively. Establishing whether these bounds are tight is again 
an open problem. 
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